OcrV1, Main, Exploration, bibRecord, 000060

Historical handwriting representation model dedicated to word spotting application

Identifieur interne : 000060 ( Main/Exploration ); précédent : 000059; suivant : 000061

Historical handwriting representation model dedicated to word spotting application

Auteurs : Peng Wang [France]

Source :

RBID : Hal:tel-01312213

Descripteurs français

mix :
- Contexte de forme, Modèle de représentation, Recherche par similarité, Reconnaissance de mots.

English descriptors

mix :
- Comprehensive representation model, Graph-based, Shape context, Word spotting.

Abstract

As more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification

Url:

https://tel.archives-ouvertes.fr/tel-01312213

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Hal, to step Corpus: 000056
to stream Hal, to step Curation: 000056
to stream Hal, to step Checkpoint: 000020
to stream Main, to step Merge: 000060
to stream Main, to step Curation: 000060

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Historical handwriting representation model dedicated to word spotting application</title>
<title xml:lang="fr">Modèle de représentation des écritures pour la recherche de mots par similarité dans les documents manuscrits du patrimoine</title>
<author><name sortKey="Wang, Peng" sort="Wang, Peng" uniqKey="Wang P" first="Peng" last="Wang">Peng Wang</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-17835" status="VALID"><orgName>Laboratoire Hubert Curien [Saint Etienne]</orgName>
<orgName type="acronym">LHC</orgName>
<desc><address><addrLine>18 rue du Professeur Lauras 42000 SAINT-ETIENNE</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://laboratoirehubertcurien.fr</ref>
</desc>
<listRelation><relation active="#struct-300284" type="direct"></relation>
<relation name="UMR5516" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300284" type="direct"><org type="institution" xml:id="struct-300284" status="VALID"><orgName>Université Jean Monnet - Saint-Etienne</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5516" active="#struct-441569" type="direct"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Saint-Étienne</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Jean Monnet Saint-Etienne</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lyon</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:tel-01312213</idno>
<idno type="halId">tel-01312213</idno>
<idno type="halUri">https://tel.archives-ouvertes.fr/tel-01312213</idno>
<idno type="url">https://tel.archives-ouvertes.fr/tel-01312213</idno>
<date when="2014-11-18">2014-11-18</date>
<idno type="wicri:Area/Hal/Corpus">000056</idno>
<idno type="wicri:Area/Hal/Curation">000056</idno>
<idno type="wicri:Area/Hal/Checkpoint">000020</idno>
<idno type="wicri:Area/Main/Merge">000060</idno>
<idno type="wicri:Area/Main/Curation">000060</idno>
<idno type="wicri:Area/Main/Exploration">000060</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Historical handwriting representation model dedicated to word spotting application</title>
<title xml:lang="fr">Modèle de représentation des écritures pour la recherche de mots par similarité dans les documents manuscrits du patrimoine</title>
<author><name sortKey="Wang, Peng" sort="Wang, Peng" uniqKey="Wang P" first="Peng" last="Wang">Peng Wang</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-17835" status="VALID"><orgName>Laboratoire Hubert Curien [Saint Etienne]</orgName>
<orgName type="acronym">LHC</orgName>
<desc><address><addrLine>18 rue du Professeur Lauras 42000 SAINT-ETIENNE</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://laboratoirehubertcurien.fr</ref>
</desc>
<listRelation><relation active="#struct-300284" type="direct"></relation>
<relation name="UMR5516" active="#struct-441569" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300284" type="direct"><org type="institution" xml:id="struct-300284" status="VALID"><orgName>Université Jean Monnet - Saint-Etienne</orgName>
<desc><address><country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="UMR5516" active="#struct-441569" type="direct"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="IdRef">02636817X</idno>
<idno type="ISNI">0000000122597504</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Saint-Étienne</settlement>
<region type="region" nuts="2">Auvergne-Rhône-Alpes</region>
<region type="old region" nuts="2">Rhône-Alpes</region>
</placeName>
<orgName type="university">Université Jean Monnet Saint-Etienne</orgName>
<orgName type="institution" wicri:auto="newGroup">Université de Lyon</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="en"><term>Comprehensive representation model</term>
<term>Graph-based</term>
<term>Shape context</term>
<term>Word spotting</term>
</keywords>
<keywords scheme="mix" xml:lang="fr"><term>Contexte de forme</term>
<term>Modèle de représentation</term>
<term>Recherche par similarité</term>
<term>Reconnaissance de mots</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">As more and more documents, especially historical handwritten documents, are converted into digitized version for long-term preservation, the demands for efficient information retrieval techniques in such document images are increasing. The objective of this research is to establish an effective representation model for handwriting, especially historical manuscripts. The proposed model is supposed to help the navigation in historical document collections. Specifically speaking, we developed our handwriting representation model with regards to word spotting application. As a specific pattern recognition task, handwritten word spotting faces many challenges such as the high intra-writer and inter-writer variability. Nowadays, it has been admitted that OCR techniques are unsuccessful in handwritten offline documents, especially historical ones. Therefore, the particular characterization and comparison methods dedicated to handwritten word spotting are strongly required. In this work, we explore several techniques that allow the retrieval of singlestyle handwritten document images with query image. The proposed representation model contains two facets of handwriting, morphology and topology. Based on the skeleton of handwriting, graphs are constructed with the structural points as the vertexes and the strokes as the edges. By signing the Shape Context descriptor as the label of vertex, the contextual information of handwriting is also integrated. Moreover, we develop a coarse-to-fine system for the large-scale handwritten word spotting using our representation model. In the coarse selection, graph embedding is adapted with consideration of simple and fast computation. With selected regions of interest, in the fine selection, a specific similarity measure based on graph edit distance is designed. Regarding the importance of the order of handwriting, dynamic time warping assignment with block merging is added. The experimental results using benchmark handwriting datasets demonstrate the power of the proposed representation model and the efficiency of the developed word spotting approach. The main contribution of this work is the proposed graph-based representation model, which realizes a comprehensive description of handwriting, especially historical script. Our structure-based model captures the essential characteristics of handwriting without redundancy, and meanwhile is robust to the intra-variation of handwriting and specific noises. With additional experiments, we have also proved the potential of the proposed representation model in other symbol recognition applications, such as handwritten musical and architectural classification</div>
</front>
</TEI>
<affiliations><list><country><li>France</li>
</country>
<region><li>Auvergne-Rhône-Alpes</li>
<li>Rhône-Alpes</li>
</region>
<settlement><li>Saint-Étienne</li>
</settlement>
<orgName><li>Université Jean Monnet Saint-Etienne</li>
<li>Université de Lyon</li>
</orgName>
</list>
<tree><country name="France"><region name="Auvergne-Rhône-Alpes"><name sortKey="Wang, Peng" sort="Wang, Peng" uniqKey="Wang P" first="Peng" last="Wang">Peng Wang</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000060 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000060 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:tel-01312213
   |texte=   Historical handwriting representation model dedicated to word spotting application
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Historical handwriting representation model dedicated to word spotting application

Historical handwriting representation model dedicated to word spotting application

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri